Missing Values with iterative imputation
نویسنده
چکیده
In this paper, the author designs an efficient method for imputing iteratively missing target values with semiparametric kernel regression imputation, known as the semi-parametric iterative imputation algorithm (SIIA). While there is little prior knowledge on the datasets, the proposed iterative imputation method, which impute each missing value several times until the algorithms converges in each model, utilize a substantially useful amount of information. Additionally, this information includes occurrences involving missing values as well as capturing the real dataset distribution easier than the parametric or nonparametric imputation techniques. Experimental results show that the author’s imputation methods outperform the existing methods in terms of imputation accuracy, in particular in the situation with high missing ratio. comparing to other methods. Missing values imputation is to find an efficient way to “guess” the missing values (imputation) based on other information in datasets. One advantage of this approach is that missing values treatment is independent of the learning algorithm used. That allows users to select the most suitable imputation method for their applications. Commonly used imputation methods for missing values include parametric regression imputation methods and non-parametric regression imputations. However, there are other relations within real world data, and both parametric imputation method and non-parametric imputation method are not adequate to capture the relations. That is, we know a part of relation DOI: 10.4018/jdwm.2010070101 2 International Journal of Data Warehousing and Mining, 6(3), 1-10, July-September 2010 Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. between independent variables (condition attributes) and dependent variable (target attribute), e.g., we can regard this relation as parametric model, but we have no knowledge on the relation between other independent variables and dependent variable, e.g., we can take it as nonparametric model. However, combining these two parts, it is difficult for us to consider the compound relation with parametric model or nonparametric model. Moreover, the case is very general in real application. In this paper, we regard the relation containing two models as semi-parametric model or partial parametric model. In real application, semi-parametric model is natural than non-parametric model because users can always know some information but no all on the datasets, such some parameters in the datasets. To model this semi-parametric relation, in this paper, we design an efficient semi-parametric iterative imputation method (SIIA) that takes into account the advantages of parametric models and pure non-parametric models so as to overcome their certain shortcomings for each single model. In the left parts, we will first review the existing literatures for dealing with missing values. And then we design the iterative imputation methods which can impute missing values with kernel method or even in the dataset with high missing ratio. After that, we will demonstrate our proposed methods with all kinds of experiments. Finally, we will conclusion our works and put forward our future work.
منابع مشابه
Estimating Semi-Parametric Missing Values with Iterative Imputation
In this paper, the author designs an efficient method for imputing iteratively missing target values with semi-parametric kernel regression imputation, known as the semi-parametric iterative imputation algorithm (SIIA). While there is little prior knowledge on the datasets, the proposed iterative imputation method, which impute each missing value several times until the algorithms converges in ...
متن کاملMissing Values Imputation Based on Iterative Learning
Databases for machine learning and data mining often have missing values. How to develop effective method for missing values imputation is an important problem in the field of machine learning and data mining. In this paper, several methods for dealing with missing values in incomplete data are reviewed, and a new method for missing values imputation based on iterative learning is proposed. The...
متن کاملEstimating Missing Values Using Mixture Kernel Regression
One of the important problem in data quality is the presence of missing data. So missing data imputation is an important issue in learning from incomplete data. Imputation is a procedure that replaces the missing values in a data set by some plausible values. Various techniques have been developed to deal with missing values in data sets with homogenous attributes. But those approaches are inde...
متن کاملInfluence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons
Background Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern...
متن کاملCost-Sensitive Imputing Missing Values with Ordering
Missing value is an unavoidable problem when dealing with real world data sources, and various approaches for dealing with missing data have been developed. In fact, it is very important to consider the imputation ordering (ordering means which missing value should be imputed at first with the help of a specific criterion) during the imputation process, because not all attributes have the same ...
متن کامل